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Abstract 



In relatively free word order languages, grammat- 
ical functions are intricately related to case mark- 
ing. Assuming an ordered representation of the 
predicate-argument structure, this work proposes a 
Combinatory Categorial Grammar formulation of 
relating surface case cues to categories and types 
for correctly placing the arguments in the predicate- 
argument structure. This is achieved by assign- 
ing case markers GF-encoding categories. Unlike 
other CG formulations, type shifting does not pro- 
liferate or cause spurious ambiguity. Categories of 
all argument-encoding grammatical functions fol- 
low from the same principle of category assignment. 
Normal order evaluation of the combinatory form 
reveals the predicate-argument structure. The appli- 
cation of the method to Turkish is shown. 

1 Introduction 



Recent theorizing in linguistics brought forth a level 
of representation called the Predicate-Argument 
Structure (PAS). PAS acts as the interface be- 
tween lexical semantics and d-structure in GB 
(primshaw, 1990|), functional structure in 



LFG 

( l^lsina, 1996|), a nd complement structure in HPSG 
( Iwechsler, 199^ ). PAS is the sole level of rep- 



resentation in Combinatory Categorial Grammar 
(CCG) dSteedman, T996| ). All formulations as- 
sume a prominence-based structured representation 
for PAS, although they differ in the terms used 
for defining prominence. For instance, Grimshaw 
(199C) defines the thematic hierarchy as: 

Agent > Experiencer > Goal / Location / Source 

> Theme 



* Thanks to Mark Steedman for discussion and material, and 
to the anonymous reviewer of an extended version whose com- 
ments led to significant revisions. This research is supported 
by TUBITAK (EEEAG-90) and NATO Science Division (TU- 
LANGUAGE). 



whereas LFG accounts make use of the following 



(Bresnan and Kanerva, 1989) 



Agent > Beneficiary > Goal / Experiencer > Inst 
> Patient / Theme > Locative. 
As an illustration, the predicate-argument struc- 
tures of the agentive verb murder and the psycho- 



logical verb /ear are ( primshaw, 1990| , p. 8): 
murder (x (y)) 

Agent Theme 
fear (x (y)) 

Exp Theme 
To abstract away from language-particular case 
systems and mapping of thematic roles to grammati- 
cal functions, I assume the Applicative Hierarchy of 



Shaumyan ( |1987| ) for the definition of prominence: 

Primary Term > Secondary Term > 

Tertiary Term > Oblique Term. 
Primacy of a term over another is defined by the for- 
mer having a wider range of syntactic features than 
the latter. In an accusative language, subjects are 
less marked (hence primary) than objects; all verbs 
take subjects but only transitive verbs take objects. 
Terms (=arguments) can be denoted by the genotype 
indices on NPs, such as A^Pi, NP2 for primary and 
secondary terms.[| An NP2 would be a direct object 
(NPacc) in an accusative language, or an ergative- 
marked NP (NPerg) in an ergative language. This 
level of description also simplifies the formulation 
of grammatical function changing; the primary term 
of a passivized predicate (PASS p) is the secondary 
term of the active p. I follow Shaumyan and Steed- 



man (1996|) also in the ordered representation of the 
PAS (l|). The reader is referred to (Shaumyan, 1987) 
for linguistic justification of this ordering. 



(1) Pred. . . <Sec. TermxPrimary Term> 
Given this representation, the surface order of 



Shaumyan uses we prefer NPi, NP2 for easier 

exposition in later formulations. 



constituents is often in conflict with the order in the 
PAS. For instance, English as a configurational SVO 
language has the mapping: 




However, in a non-configurational language, per- 
mutations of word order are possible, and grammat- 
ical functions are often indicated not by configura- 
tions but by case marking. For instance, in Turkish, 
all six permutations of the basic SOV order are pos- 
sible, and Japanese allows two verb-final permuta- 
tions of underlying SOV. The relationship between 
case marking and scrambling is crucial in languages 
with flexible word order. A computational solution 
to the problem must rely on some principles of par- 
simony for representing categories and types of ar- 
guments and predicates, and efficiency of process- 
ing. 

In a categorial formulation, grammatical 
functions of preverbal and postverbal NPs in 
can be made explicit by type shifting^ 
the subject to S/(S\NPi) and the object to 
{S\NPi)\{(S\NPi)/NP2). These categories follow 
from the ord er-preserving type shifting scheme 
( powty, 1988| ): 

(3) NP =^ T/(T\NP) or T\(T/NP) 

To resolve the opposition between surface order 
and the PAS in a free word order language, one can 
let the type shifted categories of terms proliferate, 
or reformulate CCG in such a way that arguments of 
the verbs are sets, rather than lists whose arguments 
are made available one at a time. The former alter- 
native makes the spurious ambiguity problem of CG 



parsing ( Karttunen, 1989 ) even more severe. Multi- 
set CCG ( Hoffman, 1995 ) is an example of the set- 
oriented approach. It is known to be computation- 
ally tractable but less efficient than the polynomial 
time CCG algorithm of Vijay-Shanker and Weir 
(|1993|). I try to show in this paper that the tradi- 



tional curried notation of CG with type shifting can 
be maintained to account for Surface Form^PAS 
mapping without leading to proliferation of argu- 
ment categories or to spurious ambiguity. 

Categorial framework is particularly suited for 
this mapping due to its lexicalism. Grammatical 
functions of the nouns in the lexicon are assigned 

■^aka. type raising, lifting, or type change 



by case markers, which are also in the lexicon. 
Thus, grammatical function marking follows nat- 
urally from the general CCG schema comprising 
rules of application (A) and composition (B). The 
functor-argument distinction in CG helps to model 
prominence relations without extra levels of repre- 
sentation. CCG schema (Steedman ( |1988| ; |1990| )) 
is summarized in (^. Combinator notation is pre- 
ferred here because they are the formal primitives 
operating on the PAS (cf. ( |Curry and Feys, 1958 ) 
for Combinatory Logic). Application is the only 
primitive of the combinatory system; it is indicated 
by juxtaposition in the examples and denoted by • in 
the normal order evaluator (§^. B has the reduction 
rule Bfga>f{ga). 



X/Y:f 


Y:a 






X: fa 


Y:a 


X\Y: 


f 




X: fa 


X/Y:f 


Y/Z: 


a 




X/Z: Bfg 


r\Z:g 


X\Y: 
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^B< 


X\Z: Bfg 


X/Y:f 


Y\Z: 


a 




X\Z: Bfg 


Y/Z:g 


X\Y: 


f 


^Bx< 


X/Z: Bfg 



2 Grammatical Functions, Type Shifting, 
and Composition 

In order to derive all permutations of a ditransi- 
tive construction in Turkish using (Q), the dative- 
marked indirect object {NP3) must be type shifted 
in 48 (4! 2) different ways so that coordination with 
the left-adjacent and the right-adjacent constituent 
is possible. This is due to the fact that the result 
category T is always a conjoinable type, and the ar- 
gument category T/NP3 (and T\NPs) must be al- 
lowed to compose with the result category of the 
adjacent functor. However, categories of arguments 
can be made more informative about grammatical 
functions and word order. The basic principle is as 
follows: The category assigned for argument n must 
contain all and only the term information about NPi 
for all i < n. An NP2 type must contain in its cat- 
egory word order information about NPi and NP2 
but not NP3. This can be generalized as in (^: 

(5) Category assignment for argument n: 



C{n) 



Tr/Ta or Tr\Ta 
NPr.. 



Ta = Lexical category of an NPn- 
governing element (e.g., a verb) in the lan- 
guage whose highest genotype argument is 

NPn. 

Tr = The category obtained from Ta by re- 
moving NPn- 

Case markers in Turkish are suffixes attached to 
noun groups.[| The types of case markers in the lex- 
icon can be defined as: 

(6) Lexical type assignment for the case 
marker (-case) encoding argument n: 

-case : = C{n):T{C{n))x\N: x 

where T(C) denotes the semantic type for cate- 
gory C: 

(7) a. T{NPn) = I (lower type for A^P„) 

b. T{C) = T (if C is a type shifted category 
as in @) 

c. T{C) = BBT (if C is a type shifted and 
composed category) 

(||) and ^ are schemas that yield three lexical 
categories per -case: one for lower type, and two for 
higher types which differ only in the directionality 
of the main function due to (Q). For instance, for the 
accusative case suffix encoding NP2, we have: 
-ACC:= iVP2:lx\A^:x 

{{S\NPi)/{S\NPi\NP2)):Jx\N:x 
{{S\NPi)\{S\NPi\NP2)):Tx\N:x 
Type shifting alone is too constraining if the verbs 
take their arguments in an order different from the 
Applicative Hierarchy (f|T]). For instance, the cat- 
egory of Turkish ditransitives is 5'|M'i|A^P3|A^P2- 
Thus the verb has the wrapping semantics Cv' 
where C is the permutator with the reduction 
rule Cfga>fag. Type shifting an NP^ yields 
(S\NPi\NP2)/(S\NPi\NP2\NP3) in which the argu- 
ment category is not lexically licensed. (||) is order- 
preserving in a language-particular way; the result 
category always corresponds to a lexical category 
in the language if the argument category does too. 

For arguments requiring a non-canonical order, 
we need type shifting and composition (hence the 
third clause in (^): 



T 



B 



''As suggested in ( [Bozsahin and Gocmen, 1995 ), morpho- 
logical and syntactic composition can be distinguished by asso- 
ciating several attachment calculi with functors and arguments 
(e.g., affixation, concatenation, clitics, etc.) 



NPs-.x (S\NPi)/(S\NPi\NP3):Jx 
(S\NPi\NP2)/(S\NPi\NP3\NP2):BITx) = BBTx 

Once syntactic category of the argument is fixed, 
its semantics is uniquely determined by (^). 

The combinatory primitives operating on the PAS 
are I (^), T (0b-c), and B (^). T has the reduction 
rule Taf>fa, and \ f>f. The use of T or B signifies 
that the term's category is a functor; its correct place 
in the PAS is yet to be determined. I indicates that 
the term is in the right place in the partially derived 
PAS. 

According to (^, there is a unique result- 
argument combination for a higher type NP3, com- 
pared to 24 using (|). (g) differs from (|) in another 
significant aspect: Tr and Ta may contain direction- 
ally underspecified categories if licensed by the lex- 
icon. Directional underspecification is needed when 
arguments of a verb can scramble to either side of 
the verb. It is necessary in Turkish and Warlpiri 
but not in Japanese or Korean. The neutral slash | 
is a lexical operator; it is instantiated to either \ or 
/ during parsing. A crucial use of underspecifica- 
tion is shown in (||). SV composition could not fol- 
low through if the verbs had backward-looking cat- 
egories; composition of the type shifted subject and 
the verb in this case would only yield a backward- 
looking 5'\A^P2 by the schema (Q). 



(8) Adam kurmu§ ama cocuk topladi masa-yi 
man.NOM set but child.NOM gather table-ACC 



S/{S\NPi)S\NPi\NP2 

; ^> 

S/NP2 



S/NP2 



NP2 



S/NP2 



'The man had set the table but the child is cleaning it.' 

The schema in (js]) makes the arguments available 
in higher types, and allows lower (NPn) types only 
if higher types fail (as in NP2 in (|8|)). There are 
two reasons for this: Higher types carry more in- 
formation about surface order of the language, and 
they are sufficient to cover bounded phenomena. ^ 
shows how higher types correctly derive the PAS in 
various word orders. Lower types are indispensable 
for unbounded constructions such as relativization 
and coordination. The choice is due to a concern 
for economy. If lower types were allowed freely, 
they would yield the correct PAS as well: 



(9) S 10 DO V 

NPi: Is' NP3: W NP2: \o' DV: Cv' 

A< 

S\NPi\NP3: {Cv' ){\o' ) 

S\NPi:{Cv'){\o'){\i') 

S:{Cv'){\o')i\i'){\s')>v'i'o's' 



(10) a. Mehmet kitab-i oku-du 
M.NOM book-ACC read-PAST 



S/IV: Tm' IV /TV: Jb' TV: r' 

A> 

IV: Jb' r' 

A> 

S: Tm' (W r')>r' 6' m' 
'Mehmet read the book.' 



In parsing this is achieved as follows: An A^P^ 
can only be the argument in a rule of application, 
and schema is the only way to obtain NP^ from 
a noun group. Thus it suffices to check in the ap- 
plication rules that if the argument category is A^Pfc, 
then the functor's result category (e.g., X in XjY) 
has none of the terms with genotype indices lower 
than k. NP2 in (^ is licensed because the adjacent 
functor is S/NP2- NP2 in (^ is not licensed because 
the adjacent functor has A^Pi. 

For noun-governed grammatical functions such 
as the genitive (NP^), (§) licenses result categories 
that are underspecified with respect to the geno- 
type index. This is indeed necessary because the 
resulting NP can be further inflected on case and 
assume a genotype index. For Turkish, the type 
shifted category is C(5) =NPagr/(NPagr\NP^). 
Hence the genitive suffix bears the category 
C{5)\N. Agreement features enforce the possessor- 
possessed agreement on person and number via uni- 
fication (as in UCG (jCalder et al., 1988|)): 



kitab-i Mehmet oku-du 



kalem 
pencil 



-m 
-GEN. 3 s 



uc 
tip 



-P0SS.3S 



N:p' C{5)\N:J N:t' (NPagr\NP5)\N: pass 
A< ■A< 

NPaar/(NPagr\NP5): V NPagr\NP5: pOSS t' 

A> 

NPagr ■ Tp' {pOSS t' )>{pOSS t' )p' 

'Tlie tip of the pencil' 



3 Word Order and Scrambling 

Due to space limitations, the following abbre- 
viated categories are employed in derivations: 

IV = S\NPi 

TV = S\NPi\NP2 

DV = S\NPi\NP3\NP2 
The categories licensed by (||) can then be written 
as IV/TV and IV\TV for NP2, TV/DV and TV\DV 
for NP 3, etc. ([IO|a-b) show the verb-final variations 
in the word order. The bracketings in the PAS and 
juxtaposition are left-associative; {fa)b is same as 
fab. 



IV/TV:Jb' S\IV:Jm' TV:r' 

S/TV:B{Tm'){Tb') 

A> 

S:B(Tm')(T6')r'>r' b' m! 



(I^i) exhibits spurious ambiguity. Forward com- 
position of SjlV and IV jTV is possible, yielding 
exactly the same PAS. This problem is resolved 
by grammar rewriting in the sense proposed by 
Eisner^ (1996). Grammar rewriting can be done 
using predictive combinators (Wittenburg, 1987), 
but they cannot handle crossing compositions that 
are essential to our method. Other normal form 
parsers, e.g. that of Hepple and Morrill (1989), 



have the same problem. All grammar rules in 
in fact check the labels of the constituent cate- 
gories, which show how the category is derived. 
The labels are as in ( pisner, 1996 ). -FC: Output 
of forward composition, of which forward cross- 
ing composition is a special case. -BC: Output of 
backward composition, of which backward cross- 
ing composition is a special case. -OT: Lexical 
or type shifted category. The goal is to block 
e.g., X/y-FC y/Z-{FC, BC, OT} X/Z and 

X/y-FC F-{FC, BC, OT} X in (|^a). SjTV 

composition would have the label -FC, which can- 
not be an input to forward application. In ([TO|b), 
the backward composition follows through since it 
has the category-label ^/TY-BC, which the forward 
application rule does not block. We use Eisner's 
method to rewrite all rules in (^. 

(|I|a-b) show the normal form parses for post- 
verbal scrambling, and (ll:-d) for verb-medial 
cases. 



■*Eisner ( |l994 p.81) in fact suggested that the labeling sys- 
tem can be implemented in the grammar by templates, or in the 
processor by labeling the chart entries. 



(11) a. oku-du Mehmet kitab-i 
read-PAST M.NOM book-ACC 



TV:r' S/IV:Jm' IV\TV:Jb' 

S\TV:B{Tm'){Tb') 

A< 

S:B{Tm'){Tb'y>r' b' m' 
'Mehmet read the book.' 

b. oku-du kitab-i Mehmet 



TV: r' IV\TV: W S\IV: Jm' 

A< 

IV : lb' r' 

A< 

S:Jm' {jy r')>r'b'm' 

c. kitab-i oku-du Mehmet 



IV/TV:Jb' TV:r' S\IV:Jm' 

A> 

IV : Jb' r' 

A< 

S:Jm' (Tfe'r')>r' b' m' 

d. Mehmet oku-du kitab-i 



S/lV:Jm' TV:r' IV\TV:Jb' 

A< 

IV:Tb'r' 

A> 

S:Jm' {Jb'r')>r' b' m' 

Controlled lexical redundancy of higher types, 
e.g., having both (and only) IV /TV and IV\TV \i- 
censed by the lexicon for an NP2, does not lead to 



alternative derivations in (10-11). Assume that A/B 
B\C, where A/B and B\C are categories produced 
by (|5]), gives a successful parse using the output 
A\C. A\B B\C and A\B B/C are not composable 
types according to (^. The other possible configu- 
ration, A/B B/C, yields an A/C which looks for C 
in the other direction. Multiple derivations appear 
to be possible if there is an order-changing com- 
position over C, such as C/C (e.g., a VP modifier 
IV /IV). ( |T2| ) shows two possible configurations with 
a C on the right. ([l2|b) is blocked by label check be- 



cause A/C-FC C =^A> ^ licensed by the 



grammar. If C were to the left, only (12 a) would 



succeed. Similar reasoning can be used to show the 
uniqueness of derivation in other patterns of direc- 
tions. 



(12) a. C/C A/B B\CC 
Bx> 



A\C-FC 
"Bx< 



A/C-BC 
A-OT 



b. C/C A/B B/C C 

— B> 

A/C-FC 

Constrained type shifting avoids the problem 
with freely available categories in Eisner's normal 
form parsing scheme. However, some surface char- 
acteristics of the language, such as lack of case 
marking in certain constructions, puts the burden 



of type shifting on the processor (Bozsahin, 1997). 
Lower type arguments such as NP2 pose a different 
kind of ambiguity problem. Although they are re- 
quired in unbounded constructions, they may yield 
alternative derivations of local scrambling cases in a 
labelled CCG. For instance, when NP2 is peripheral 
in a ditransitive construction and the verb can form 
a constituent with all the other arguments (S\NP2 or 
S/NP2), the parser allows NP2- This is unavoidable 
unless the parser is made aware of the local and non- 
local context. In other words, this method solves the 
spurious ambiguity problem between higher types, 
but not among higher and lower types. One can try 
to remedy this problem by making the availability of 
types dependent on some measures of prominence, 
e.g., allowing subjects only in higher types to ac- 
count for subject-complement asymmetries. But, as 
pointed out by Eisner ( 1996 , p. 85), this is not spu- 
rious ambiguity in the technical sense, just multi- 
ple derivations due to alternative lexical category 
assignments. Eliminating ambiguity in such cases 
remains to be solved. 

4 Revealing the PAS 

The output of the parser is a combinatory form. The 
combinators in this form may arise from the CCG 
sch ema, i.e., the com positor B, and the substitutor 

5 ( Steedman, 1987 ). They may also be projected 
from the PAS of a lexical item, such as the dupli- 
cator W (with the reduction rule \N fa>faa) for re- 
flexives, and B"^^C for predicate composition with 
the causative suffix. For instance, the combinatory 
form for ([l3|a) is the expression (|l3|b). 

(13) a. Adam gocug-a kitab-i 

man.NOM child-DAT book-ACC 



:m :c 
oku-t-tu 

read-CAUS-PAST 

A TTrr"^/ 



:B'^CAUSCr' 
'The man had the child read the book.' 

b. T-m' •(B-(T-6' )-(T-c' )-(B3-cause-Ct' )) 



B^-CAUSE-C-r'-c'-6'-m'> (4) 
CAUSE-(C-r'-c'-6')-m'> (5) 
CAUSE-(r' -6' -c' )-m' (6) 



T 



m 




Although B works in a binary manner in CCG 
to achieve abstraction, it requires 3 arguments for 
full evaluation (its order is 3). Revealing the PAS 
amounts to stripping off all combinators from the 
combinatory form by evaluating the reducible ex- 
pressions (redexes). Bfg is not a redex but Bfga is. 
In other words, the derivations by the parser must 
saturate the combinators in order to reveal the PAS, 
which should contain no combinators. PAS is the 
semantic normal form of a derivation. 

The sequence of evaluation is the normal or- 
der, which corresponds to reducing the leftmost- 



outermost redex first ( [Peyton Jones, 1987| ). In tree- 
theoretic terms, this is depth-first reduction of the 
combinator tree in which the rearrangement is con- 
trolled by the reduction rule of the leftmost com- 
binator, e.g., Jm' X>Xm' where X is the paren- 
thesized subexpression in (pj|b). Reduction by T 
yields: 




Further reductions eventually reveal the PAS : 

B-(T-6' )-(T-c' )-(B3-cause-Ct' > (1) 
T-6' -(T-c' -(B^-CAUSE-C-r' ))-m' > (2) 
T-c'-(B3-CAUSE-C-r')-6'-m'> (3) 



By the second Church-Rosser theorem, normal 
order evaluation will terminate if the combinatory 
form has a normal form. But Combinatory Logic 
has the same power as A— calculus, and suffers 
from the same undecidability results. For instance, 
WWW has no normal form because the reductions 
will never terminate. Some terminating reductions, 
such as CII6>6I, has no normal form either. It is 
an open question as to whether such forms can be 
projected from a natural language lexicon. In an ex- 
pression X-Y where X is not a redex, the evalua- 
tor recursively evaluates to reduce as much as pos- 
sible because X may contain other redexes, as in (^) 
above. Recursion is terminated either by obtaining 
the normal form, as in (^) above, or by equivalence 
check. For instance, (C-(l-a)-6)-y recurses on the 
left subexpression to obtain [C-a-b) then gives up 
on this subexpression since the evaluator returns the 
same expression without further evaluation. 

5 Conclusion 

If an ordered representation of the PAS is assumed 
as many theories do nowadays, its derivation from 
the surface string requires that the category assign- 
ment for case cues be rich enough in word order 
and grammatical function information to correctly 
place the arguments in the PAS. This work shows 
that these categories and their types can be uniquely 
characterized in the lexicon and tightly controlled in 
parsing. Spurious ambiguity problem is kept under 
control by normal form parsing on the syntactic side 
with the use of labelled categories in the grammar. 
Thus, the PAS of a derivation can be determined 
uniquely even in the presence of type shifting. The 
same strategy can account for deriving the PAS in 
unbounded constructions and non-constituent coor- 



dination (Bozsahin, 1997) 



Parser's output (the combinatory form) is reduced 
to a PAS by normal order evaluation. Model- 
theoretic interpretation can proceed in parallel with 
derivations, or as a post-evaluation stage which 
takes the PAS as input. Quantification and scram- 
bling in free word order languages interact in many 
ways, and future work will concentrate on this as- 
pect of semantics. 
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